Skip to content

Cache CLI extractor paths across Actions steps#3950

Draft
mario-campos wants to merge 2 commits into
mainfrom
mario-campos/cache-cli-resolve-langs
Draft

Cache CLI extractor paths across Actions steps#3950
mario-campos wants to merge 2 commits into
mainfrom
mario-campos/cache-cli-resolve-langs

Conversation

@mario-campos

Copy link
Copy Markdown
Contributor

Similar to #3943, this PR caches the output of codeql resolve languages, which contains the paths to the various extractors so that repeated calls to resolveLanguages() are idempotent. Additionally, re-implement resolveExtractor() as a wrapper over resolveLanguages() (to re-use the cached output) rather than shell out to codeql resolve extractor.

In one experiment, I counted seven instances of shelling out to codeql resolve extractor. When you dig into the code, you can see why: resolveExtractor() is not called often or from many places; But one caller is isTracedLanguage(), which is wrapped by isScannedLanguage(). And these functions are often used in a loop/map over all/some languages. This can explain why we see consecutive executions of codeql resolve extractor.

Risk assessment

For internal use only. Please select the risk level of this change:

  • Low risk: Changes are fully under feature flags, or have been fully tested and validated in pre-production environments and are highly observable, or are documentation or test only.

Which use cases does this change impact?

Workflow types:

  • Advanced setup - Impacts users who have custom CodeQL workflows.
  • Managed - Impacts users with dynamic workflows (Default Setup, Code Quality, ...).

Products:

  • Code Scanning - The changes impact analyses when analysis-kinds: code-scanning.
  • Code Quality - The changes impact analyses when analysis-kinds: code-quality.
  • Other first-party - The changes impact other first-party analyses.
  • Third-party analyses - The changes affect the upload-sarif action.

Environments:

  • Dotcom - Impacts CodeQL workflows on github.com and/or GitHub Enterprise Cloud with Data Residency.
  • GHES - Impacts CodeQL workflows on GitHub Enterprise Server.
  • Testing/None - This change does not impact any CodeQL workflows in production.

How did/will you validate this change?

  • Unit tests - I am depending on unit test coverage (i.e. tests in .test.ts files).
  • End-to-end tests - I am depending on PR checks (i.e. tests in pr-checks).
  • Other - Manual/local testing

If something goes wrong after this change is released, what are the mitigation and rollback strategies?

  • Feature flags - All new or changed code paths can be fully disabled with corresponding feature flags.
  • Rollback - Change can only be disabled by rolling back the release or releasing a new version with a fix.
  • Development/testing only - This change cannot cause any failures in production.
  • Other - Please provide details.

How will you know if something goes wrong after this change is released?

  • Telemetry - I rely on existing telemetry or have made changes to the telemetry.
    • Dashboards - I will watch relevant dashboards for issues after the release. Consider whether this requires this change to be released at a particular time rather than as part of a regular release.
    • Alerts - New or existing monitors will trip if something goes wrong with this change.
  • Other - Please provide details.

Are there any special considerations for merging or releasing this change?

  • No special considerations - This change can be merged at any time.
  • Special considerations - This change should only be merged once certain preconditions are met. Please provide details of those or link to this PR from an internal issue.

Merge / deployment checklist

  • Confirm this change is backwards compatible with existing workflows.
  • Consider adding a changelog entry for this change.
  • Confirm the readme and docs have been updated if necessary.

Repeated calls to `resolveLanguages()` will only pay the performance penalty of executing `codeql resolve languages` once.
By wrapping `resolveLanguages()`, which is memoized, we can avoid executing `codeql resolve extractor` several times over the course of an analysis.
@github-actions github-actions Bot added the size/S Should be easy to review label Jun 4, 2026

@henrymercer henrymercer left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caching these invocations makes a lot of sense! I have a high level comment and a couple of lower level comments.

The main point is that now that we're caching multiple invocations, it might be a good opportunity to generalise the design. For instance, you could imagine something like:

const versionCache = createPersistedCliCache({ envVar: EnvVar.CODEQL_VERSION_INFO, validate: isVersionInfo });
const resolveLanguagesCache = createPersistedCliCache({ envVar: EnvVar.CODEQL_RESOLVE_LANGUAGES, validate: isResolveLanguagesOutput });

where createPersistedCliCache handles memoising in the Action and persisting between Actions steps with an environment variable.

Some smaller things:

  • Ideally the cache entry would also depend on getExtraOptionsFromEnv(["resolve","languages"])
  • We should remove the cache in testing-utils.ts like we do for the CodeQL version cache

Comment thread src/codeql.ts
// This can be a bit slow due to the JVM startup cost. Instead, get
// the extractor path from resolveLanguages(), which caches its output.
const extractors = await this.resolveLanguages();
return extractors[language][0];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this correctly handle language aliases? We do currently normalise languages to their original names, but it would be good to at least document this change of behaviour if not resolve aliases.

Also, we should make sure that extractors[language] is defined and if not, error nicely.

@mbg mbg left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @henrymercer's comments regarding a more generalised design for this. I am wondering about the use of environment variables here vs using a file on disk. I don't know if you have already considered this, but we store e.g. the Action configuration on disk as a file. Perhaps that would make sense for these cached CLI results as well.

A general point: could we also make sure to add doc comments for new top-level definitions before merging?

Comment thread src/environment.ts
CODEQL_VERSION_INFO = "CODEQL_ACTION_CLI_VERSION_INFO",

/**
* `ResolveLanguagesOutput` for the CodeQL CLI, so later Actions steps can reuse it instead of

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: ResolveLanguagesOutput is a type, but the environment variable presumably stores some representation of a value of that type. For example:

Suggested change
* `ResolveLanguagesOutput` for the CodeQL CLI, so later Actions steps can reuse it instead of
* Used to store a stringified JSON representation of a `ResolveLanguagesOutput` value for the CodeQL CLI, so later Actions steps can reuse it instead of

Comment thread src/util.ts
Comment on lines +730 to +740
function isPersistedResolveLanguagesOutput(
value: unknown,
): value is PersistedResolveLanguagesOutput {
return (
typeof value === "object" &&
value !== null &&
typeof (value as Record<string, unknown>).cmd === "string" &&
typeof (value as Record<string, unknown>).output === "object" &&
(value as Record<string, unknown>).output !== null
);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this mirrors what was done in #3943, but it would probably make sense to use the helper functions from the ./json module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/S Should be easy to review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants